Goto

Collaborating Authors

 negative input


ULU: A Unified Activation Function

Huo, Simin

arXiv.org Artificial Intelligence

We propose \textbf{ULU}, a novel non-monotonic, piecewise activation function defined as $\{f(x;α_1),x<0; f(x;α_2),x>=0 \}$, where $f(x;α)=0.5x(tanh(αx)+1),α>0$. ULU treats positive and negative inputs differently. Extensive experiments demonstrate ULU significantly outperforms ReLU and Mish across image classification and object detection tasks. Its variant Adaptive ULU (\textbf{AULU}) is expressed as $\{f(x;β_1^2),x<0; f(x;β_2^2),x>=0 \}$, where $β_1$ and $β_2$ are learnable parameters, enabling it to adapt its response separately for positive and negative inputs. Additionally, we introduce the LIB (Like Inductive Bias) metric from AULU to quantitatively measure the inductive bias of the model.


Deriving Activation Functions via Integration

Huang, Allen Hao

arXiv.org Artificial Intelligence

Activation functions play a crucial role in introducing non-linearities to deep neural networks. We propose a novel approach to designing activation functions by focusing on their gradients and deriving the corresponding functions through integration. Our work introduces the Expanded Integral of the Exponential Linear Unit (xIELU), a trainable piecewise activation function derived by integrating trainable affine transformations applied on the ELU activation function. xIELU combines two key gradient properties: a trainable and linearly increasing gradient for positive inputs, similar to ReLU$^2$, and a trainable negative gradient flow for negative inputs, akin to xSiLU. Conceptually, xIELU can be viewed as extending ReLU$^2$ to effectively handle negative inputs. In experiments with 1.1B parameter Llama models trained on 126B tokens of FineWeb Edu, xIELU achieves lower perplexity compared to both ReLU$^2$ and SwiGLU when matched for the same compute cost and parameter count.


Activation function optimization method: Learnable series linear units (LSLUs)

Feng, Chuan, Lin, Xi, Zhu, Shiping, Shi, Hongkang, Tang, Maojie, Huang, Hua

arXiv.org Artificial Intelligence

Effective activation functions introduce non-linear transformations, providing neural networks with stronger fitting capa-bilities, which help them better adapt to real data distributions. Huawei Noah's Lab believes that dynamic activation functions are more suitable than static activation functions for enhancing the non-linear capabilities of neural networks. Tsinghua University's related research also suggests using dynamically adjusted activation functions. Building on the ideas of using fine-tuned activation functions from Tsinghua University and Huawei Noah's Lab, we propose a series-based learnable ac-tivation function called LSLU (Learnable Series Linear Units). This method simplifies deep learning networks while im-proving accuracy. This method introduces learnable parameters {\theta} and {\omega} to control the activation function, adapting it to the current layer's training stage and improving the model's generalization. The principle is to increase non-linearity in each activation layer, boosting the network's overall non-linearity. We evaluate LSLU's performance on CIFAR10, CIFAR100, and specific task datasets (e.g., Silkworm), validating its effectiveness. The convergence behavior of the learnable parameters {\theta} and {\omega}, as well as their effects on generalization, are analyzed. Our empirical results show that LSLU enhances the general-ization ability of the original model in various tasks while speeding up training. In VanillaNet training, parameter {\theta} initially decreases, then increases before stabilizing, while {\omega} shows an opposite trend. Ultimately, LSLU achieves a 3.17% accuracy improvement on CIFAR100 for VanillaNet (Table 3). Codes are available at https://github.com/vontran2021/Learnable-series-linear-units-LSLU.


Deep Learning Activation Functions: Fixed-Shape, Parametric, Adaptive, Stochastic, Miscellaneous, Non-Standard, Ensemble

Hammad, M. M.

arXiv.org Artificial Intelligence

In the architecture of deep learning models, inspired by biological neurons, activation functions (AFs) play a pivotal role. They significantly influence the performance of artificial neural networks. By modulating the non-linear properties essential for learning complex patterns, AFs are fundamental in both classification and regression tasks. This paper presents a comprehensive review of various types of AFs, including fixed-shape, parametric, adaptive, stochastic/probabilistic, non-standard, and ensemble/combining types. We begin with a systematic taxonomy and detailed classification frameworks that delineates the principal characteristics of AFs and organizes them based on their structural and functional distinctions. Our in-depth analysis covers primary groups such as sigmoid-based, ReLU-based, and ELU-based AFs, discussing their theoretical foundations, mathematical formulations, and specific benefits and limitations in different contexts. We also highlight key attributes of AFs such as output range, monotonicity, and smoothness. Furthermore, we explore miscellaneous AFs that do not conform to these categories but have shown unique advantages in specialized applications. Non-standard AFs are also explored, showcasing cutting-edge variations that challenge traditional paradigms and offer enhanced adaptability and model performance. We examine strategies for combining multiple AFs to leverage complementary properties. The paper concludes with a comparative evaluation of 12 state-of-the-art AFs, using rigorous statistical and experimental methodologies to assess their efficacy. This analysis not only aids practitioners in selecting and designing the most appropriate AFs for their specific deep learning tasks but also encourages continued innovation in AF development within the machine learning community.


Neural Networks with (Low-Precision) Polynomial Approximations: New Insights and Techniques for Accuracy Improvement

Zhang, Chi, Fan, Jingjing, Au, Man Ho, Yiu, Siu Ming

arXiv.org Artificial Intelligence

Replacing non-polynomial functions (e.g., non-linear activation functions such as ReLU) in a neural network with their polynomial approximations is a standard practice in privacy-preserving machine learning. The resulting neural network, called polynomial approximation of neural network (PANN) in this paper, is compatible with advanced cryptosystems to enable privacy-preserving model inference. Using ``highly precise'' approximation, state-of-the-art PANN offers similar inference accuracy as the underlying backbone model. However, little is known about the effect of approximation, and existing literature often determined the required approximation precision empirically. In this paper, we initiate the investigation of PANN as a standalone object. Specifically, our contribution is two-fold. Firstly, we provide an explanation on the effect of approximate error in PANN. In particular, we discovered that (1) PANN is susceptible to some type of perturbations; and (2) weight regularisation significantly reduces PANN's accuracy. We support our explanation with experiments. Secondly, based on the insights from our investigations, we propose solutions to increase inference accuracy for PANN. Experiments showed that combination of our solutions is very effective: at the same precision, our PANN is 10% to 50% more accurate than state-of-the-arts; and at the same accuracy, our PANN only requires a precision of 2^{-9} while state-of-the-art solution requires a precision of 2^{-12} using the ResNet-20 model on CIFAR-10 dataset.


Exploring the Relationship: Transformative Adaptive Activation Functions in Comparison to Other Activation Functions

Kunc, Vladimír

arXiv.org Artificial Intelligence

Neural networks are the state-of-the-art approach for many tasks and the activation function is one of the main building blocks that allow such performance. Recently, a novel transformative adaptive activation function (TAAF) allowing for any vertical and horizontal translation and scaling was proposed. This work sets the TAAF into the context of other activation functions. It shows that the TAAFs generalize over 50 existing activation functions and utilize similar concepts as over 70 other activation functions, underscoring the versatility of TAAFs. This comprehensive exploration positions TAAFs as a promising and adaptable addition to neural networks.


Swim: A General-Purpose, High-Performing, and Efficient Activation Function for Locomotion Control Tasks

Abdool, Maryam, Dear, Tony

arXiv.org Artificial Intelligence

Activation functions play a significant role in the performance of deep learning algorithms. In particular, the Swish activation function tends to outperform ReLU on deeper models, including deep reinforcement learning models, across challenging tasks. Despite this progress, ReLU is the preferred function partly because it is more efficient than Swish. Furthermore, in contrast to the fields of computer vision and natural language processing, the deep reinforcement learning and robotics domains have seen less inclination to adopt new activation functions, such as Swish, and instead continue to use more traditional functions, like ReLU. To tackle those issues, we propose Swim, a general-purpose, efficient, and high-performing alternative to Swish, and then provide an analysis of its properties as well as an explanation for its high-performance relative to Swish, in terms of both reward-achievement and efficiency. We focus on testing Swim on MuJoCo's locomotion continuous control tasks since they exhibit more complex dynamics and would therefore benefit most from a high-performing and efficient activation function. We also use the TD3 algorithm in conjunction with Swim and explain this choice in the context of the robot locomotion domain. We then conclude that Swim is a state-of-the-art activation function for continuous control locomotion tasks and recommend using it with TD3 as a working framework.


Arachne: Search Based Repair of Deep Neural Networks

Sohn, Jeongju, Kang, Sungmin, Yoo, Shin

arXiv.org Artificial Intelligence

The rapid and widespread adoption of Deep Neural Networks (DNNs) has called for ways to test their behaviour, and many testing approaches have successfully revealed misbehaviour of DNNs. However, it is relatively unclear what one can do to correct such behaviour after revelation, as retraining involves costly data collection and does not guarantee to fix the underlying issue. This paper introduces Arachne, a novel program repair technique for DNNs, which directly repairs DNNs using their input-output pairs as a specification. Arachne localises neural weights on which it can generate effective patches and uses Differential Evolution to optimise the localised weights and correct the misbehaviour. An empirical study using different benchmarks shows that Arachne can fix specific misclassifications of a DNN without reducing general accuracy significantly. On average, patches generated by Arachne generalise to 61.3% of unseen misbehaviour, whereas those by a state-of-the-art DNN repair technique generalise only to 10.2% and sometimes to none while taking tens of times more than Arachne. We also show that Arachne can address fairness issues by debiasing a gender classification model. Finally, we successfully apply Arachne to a text sentiment model to show that it generalises beyond Convolutional Neural Networks.


Introduction to Deep Learning

#artificialintelligence

This article is based on Global AI Hub as the content. I have prepared this article to explain and make sense of it intuitively. Please visit the Global AI Hub and show your appreciation, it is a great free resource for Data Science.[1] I suggest you read my "Introduction to Machine Learning" article before reading this article(you can still understand without reading it). Deep learning is a type of artificial neural learning. There is no exact amount, but in general, an ANN (artificial neural network) with many (5 or more) hidden layers is called deep (structured) learning. Artificial neural networks are computing systems inspired by biological neural networks in animal's brains. An ANN is made up of a network of interconnected units or nodes called artificial neurons. Each node on an artificial neural network is called a neuron because they work like a neuron as you can see below.


An Introduction to Rectified Linear Unit (ReLU)

#artificialintelligence

Artificial neural networks are inspired by the biological neurons within the human body which activate under certain circumstances resulting in a related action performed by the body in response. Artificial neural nets consist of various layers of interconnected artificial neurons powered by activation functions which help in switching them ON/OFF. Like traditional machine learning algorithms, here too, there are certain values that neural nets learn in the training phase. Briefly, each neuron receives a multiplied version of inputs and random weights which is then added with static bias value (unique to each neuron layer), this is then passed to an appropriate activation function which decides the final value to be given out of the neuron. There are various activation functions available as per the nature of input values.